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Abstract 

Background: The Brassica B genome is known to carry several important traits, yet there has been limited analyses 
of its underlying genome structure, especially in comparison to the closely related A and C genomes. A bacterial 
artificial chromosome (BAC) library of Brassica nigra was developed and screened with 17 genes from a 222 kb 
region of A thaliana that had been well characterised in both the Brassica A and C genomes. 

Results: Fingerprinting of 483 apparently non-redundant clones defined physical contigs for the corresponding 
regions in B. nigra. The target region is duplicated in A thaliana and six homologous contigs were found in B. nigra 
resulting from the whole genome triplication event shared by the Brassiceae tribe. BACs representative of each 
region were sequenced to elucidate the level of microscale rearrangements across the Brassica species divide. 

Conclusions: Although the B genome species separated from the A/C lineage some 6 Mya, comparisons between 
the three paleopolyploid Brassica genomes revealed extensive conservation of gene content and sequence identity. 
The level of fractionation or gene loss varied across genomes and genomic regions; however, the greatest loss of 
genes was observed to be common to all three genomes. One large-scale chromosomal rearrangement 
differentiated the B genome suggesting such events could contribute to the lack of recombination observed 
between B genome species and those of the closely related A/C lineage. 

Keywords: Brassiceae, Brassica nigra, Sequence analyses, Speciation, Genome organization, Collinearity, Divergence 
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Background 

Black mustard (Brassica nigra (L.) Koch, 2n = 16, BB 
genome) represents the diploid Brassica B genome. It is 
an outcrossing oilseed species [1] which can also be used 
as a condiment [2]. However, it is a relatively minor crop 
compared with the closely related mustard species Bras- 
sica juncea (L.) Czern (AABB genome) and Brassica 
carinata Braun (BBCC genome), which are allotetraploid 
species that share the B genome [1]. Most studies within 
the Brassiceae have focused on the widely cultivated 
Brassica species, Brassica rapa L. (AA genome), Brassica 
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oleracea L. (CC genome), and Brassica napus L. (AACC 
genome) [3]. However, the diploid Brassica B genome is 
considered to be an important source of useful genes in 
Brassica breeding, including drought tolerance, disease 
resistance, and oil seed quality [4-9]. In order to exploit 
the variation found within the B genome for the breed- 
ing of other Brassica oilseeds, an understanding of the 
relationship between the three Brassica genomes is 
required. 

Although there has been extensive evidence of pairing 
and recombination between the A and C Brassica ge- 
nomes [10-12] the B genome appears unusual since no 
or limited recombination between the B genome and its 
related A and C homologues has been observed [13-17]. 
The reason for this difference can be hypothesized as 
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either the result of genetic factors regulating homolo- 
gous pairing in the B genome similar to those suggested 
for the A and C genomes [18-21] or significant struc- 
tural divergence of the B genome relative to the A and C 
genomes. Although it has been suggested through lim- 
ited sequence and comparative mapping data that the B 
genome may have diverged to such an extent that the 
lack of recombination can be explained [22], questions 
still arise as to accuracy of this assertion. Uncovering the 
level of homology at the micro level between the three 
Brassica diploid genomes should provide insights into 
the true relationship between the genomes. 

The cultivated Brassica species are closely related to 
the fully sequenced model plant Arabidopsis thaliana, 
[23] since the Brassica and Arabidopsis lineages diverged 
between 14.5 and 20 million years ago (Mya) [24]. It is 
believed that B. nigra evolved from the same polyploidy 
events, and resultant paleohexaploid, which took place 
around 7.9 to 14.6 Mya within the Brassica lineage. Al- 
though there are good estimates placing the separation 
of the B. rapa (A genome) and B. oleracea (C genome) 
at 3.7 Mya [25], there is little data to age the separation 
of the B genome from the A/C lineage [26] . Comparative 
mapping has identified blocks of conserved gene content 
and order between the A. thaliana genome and the 
Brassica A and C genomes of B. napus; where 21 ances- 
tral segments of the A. thaliana genome could be repli- 
cated and rearranged to represent the Brassica genomes 
[27]. This work and further mapping in related species 
was used to propose an ancestral karyotype, reconstructed 
from 24 conserved blocks, A-X, for the entire Brassicaceae 
family [28]. Physical mapping of the regions homologous 
to a 222 kb region of A. thaliana chromosome 4 in B. 
oleracea [29], B. rapa [30] and B. napus [31] showed 
highly conserved collinearity of the orthologous regions 
in the three genomes and corroborated the triplicated 
nature of the diploid Brassica genomes. Previous 
sequence-level studies in B. oleracea [32], B. rapa [33] 
and B. napus [34] have clarified aspects of genome evo- 
lution and organization in Brassica by exploiting the 
close relationship with the genome of the model plant 
A. thaliana. 

Physical mapping can provide an accurate representa- 
tion of a genome when sequence is not available. A 
number of large insert libraries, the foundation for most 
physical maps, have been developed for the Brassica A 
and C genome species and have been exploited to facili- 
tate map-based gene cloning for traits of interest and to 
provide insights into the evolutionary mechanisms that 
have formed these complex genomes [29-32,35-39]. In 
order to provide insights into the relatively uncharted 
Brassica B genome we have developed a large insert li- 
brary for B. nigra and have used this tool to describe the 
physical organisation of six regions of the B genome of 



B. nigra homologous to the same 222 kb region of A. 
thaliana, which was exploited previously to uncover as- 
pects of genome organisation and evolution in the Bras- 
sica A/C genomes [29,30,32,34]. We compared these 
orthologous regions of the B genome with the equivalent 
regions of A. thaliana, and the Brassica A and C ge- 
nomes at the sequence level and report highly conserved 
gene content and order across the three genomes. An 
earlier divergence time separated the B genome from the 
A and C Brassica lineage by some 3 million years, which 
is barely reflected in the level of observed genome 
fractionation. 

Results 

BGH BAC library characteristics 

In order to facilitate comparative mapping among the 
Brassica species a BAC library was constructed from a 
doubled haploid (DH) B. nigra line (NolOO). The B. 
nigra library (BGH) was generated from Hindlll digested 
genomic DNA and, includes 85,248 clones arrayed in 
384 well plates. One clone was randomly selected from 
each of 100 plates. Restriction digestion of BAC DNA 
and resolution with pulse field gel electrophoresis 
allowed the average insert size to be estimated at 130 kb. 
There were no empty clones identified among the 100 
selected. In order to accurately assess the number of 
empty clones, 20 plates (7,680 clones) were gridded in 
replicate onto medium containing IPTG and X-gal, 
which resolved 141 (1.8%) empty clones. The library 
should provide an approximate 17.5X coverage of the B 
genome, which is estimated to be 632 Mb [40]. 

Identifying the B genome BACs homologous to the A. 
thaliana target region 

We targeted the 222 kb region on A. thaliana chromo- 
some 4, which had been studied previously in B. rapa 
[30], B. oleracea [29] and B. napus [31] and is also par- 
tially duplicated on A. thaliana chromosome 5, resulting 
from the known a-duplication event. Seventeen probes 
representing A. thaliana genes from this region were 
hybridised to high-density colony filters of the BGH li- 
brary. The probes identified 18-207 BACs each, with 
207 BAC clones being recognised by two or more 
probes. The resulting 1,110 BAC clones were end- 
sequenced, with successful sequence being acquired 
from 851 clones. These sequence data were compared 
against each other and against the A. thaliana gene 
complement using BLASTN, which identified 483 po- 
tentially non-redundant BAC clones for further analyses. 
The resultant banding patterns of the Hindlll digested 
BAC clones hybridised with the gene specific probes 
were studied to confirm the initial data and to find those 
clones with shared loci (Additional file 1: Table SI and 
Figure 1). Of the 483 selected BACs a total of 362 clones 
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Figure 1 Brassica nigra contigs homologous to the target region. Contigs were assembled based on common HindlW digestion patterns 
identified by two or more genes from the target region. Closed circles show hybridisation of the gene specific probe and open circles indicate 
no hybridisation. Contigs aligned to A. thaliana chromosome 4 (a) and chromosome 5 (b). 
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Table 1 Details of probes used to screen B. nigra BAC library and the results from the hybridisation data 



Gene specific probe 


Old code 


Number of positive clones 


Number of loci 


Mean redundancy 


At4g 17260 


DL 4665 


28 


1 


28.0 


At4g 17300 


DL 4685 


37 


1 


37.0 


At4g 17340 


DL 4705 


77 


1 


77.0 


At4g 17350 


DL 4710 


54 


2 


27.0 


At4g 17380 


DL 4725 


17 


2 


8.5 


At4g 17410 


DL 4740 


43 


2 


21.5 


At4g 17440 


DL 4755 


83 


1 


83.0 


At4g 17460 


DL 4765 


51 


3 


17.0 


At4g 17480 


DL 4775 


35 


4 


8.7 


At4g 17500 


DL 4785 


40 


1 


40.0 


At4g 17570 


DL 4820 


58 


2 


29.0 


At4g 17600 


DL 4835 


107 


2 


53.5 


At4g 17650 


DL 4860 


21 


1 


21.0 


At4g 17700 


DL 4885 


10 


3 


3.3 


At4g 17730 


DL 4900 


67 


1 


67 


At4g 17760 


DL 4915 


19 


2 


9.5 


At4g 17800 


DL 4935 


85 


3 


28.3 



showed strong positive hybridisation for the 17 gene- 
specific probes while 121 from this subset did not. The 
number of BACs and loci identified for each gene spe- 
cific probe is reported in Table 1, with the number of 
homologous loci ranging between 1 and 4 for each A. 
thaliana gene. 

Contigs of overlapping BAC clones were assembled 
based on common digestion patterns observed for two 
or more genes. In addition, all 483 BAC clones were 
subjected to SNaPshot high information content finger- 
printing (HICF) and the FPC software was used to as- 
semble contigs [38,41,42]. In FPC it is recommended to 
start building contigs at high stringency to prevent 
chimeric joining of duplicated regions and to iteratively 
lower the stringency to avoid gaps in the resultant phys- 
ical map [43]. Therefore we started with a cut off value 
of 1 x 10" 35 for automatic contig assembly and used the 
"DQer" function to break up Q contigs (contigs containing 
more than 10% Questionable clones) which resulted in 16 
contigs and 62% (302) singletons. The stringency was 
lowered in a stepwise manner and the project finished 
with an optimal lower cut off value of 1 x 10" 15 , which 
resulted in 35 contigs with 38% (186) singletons that could 
not be incorporated into other contigs (Additional file 1: 
Table SI). This resulted in an assembly with a low number 
of Q clones (0-5 clones in each contig). Six B genome 
contigs which had the highest number of overlapping 
BACs (Figure 1) and were in agreement with the Southern 
hybridisation results were considered representative of the 
A. thaliana target region. The assembled contigs appeared 
to follow the order of the genes as represented in the A. 



thaliana genome (Additional file 1: Table S2 and Figure 1). 
Three B genome regions were homologous to A. thaliana 
chromosome 5 (contigs A, B and C) while the other three 
were homologous to A. thaliana chromosome 4 (contigs, 
D, E and F), and the contigs were named accord to the 
previous analysis of the Brassica A/C genomes [29,30]. 
The contig assignments were based on the presence or ab- 
sence of eight genes (At4gl7260, At4gl7300, At4gl7380, 
At4gl7440, At4gl7650, At4gl7750, At4gl7760, and 
At4gl7800) that are present in the chromosome 4 region, 
but not in the chromosome 5 region [29,44]. At this 
macro-level the only apparent major difference between 
the Brassica B genome and A and C genomes was the 
joining of contigs F and G in the B genome, which are 
physically separated in the A/C genomes (Figure 1). 

Comparative organisation at the sequence level 

BACs were selected for sequencing in order to represent 
each of the six contiguous regions identified as being 
homologous to the A. thaliana region (Figure 1). The 
list of BAC clones sequenced, and their characteristics 
are detailed in Table 2. 

Orthologous genes between B. nigra and A. thaliana 
were identified using sequence similarity, each sequenced 
contig was searched against the A. thaliana gene sequences 
(TAIR 10) using BLASTN and in addition were verified 
using ab initio prediction. The results are summarized in 
Figure 2 and show extensive conservation of gene content 
and order across each set of related genome segments. 

Five examples of gene tandem duplication were identi- 
fied in B. nigra compared to A. thaliana, At5g47150 in 
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Table 2 List of Sequenced B. nigra BAC clones 



B. nigra BAC Genome segment Size bp Accession number 



BGH 184:010 


B. nigra Contig A 


116,795 


KC96003 


BGH31:E17 


B. nigra Contig A 


257,762 


KC95996 


BGH93117 


B. nigra Contig B 


258,014 


KC96000 


BGH8:I19 


B. nigra Contig C 


127,829 


KC95992 


BGH34:M17 


B. nigra Contig C 


158,918 


KC95997 


BGH24:018 


B. nigra Contig D 


139,013 


KC95995 


BGH214J17 


B. nigra Contig D 


384,919 


KC96005 


BGH214:016 


B. nigra Contig D 


140,361 


KC96006 


BGH20:O8 


B. nigra Contig E 


142,538 


KC95994 


BGH12:A24 


B. nigra Contig E 


121495 


KC95993 


BGH 148:1 13 


B. nigra Contig F 


203,952 


KC96002 


BGH59114 


B. nigra Contig F 


169,401 


KC95998 


BGH64:E22 


B. nigra Contig F 


124,814 


KC95999 


BGH107:K14 


B. nigra Contig F 


170,106 


KC96001 


BGH206:P22 


B. nigra Contig F 


122,354 


KC96004 



contig A, At4gl7650 and At4gl7570 in contig F, and 
At4gl7350 and At4gl7600 in contig D (Figure 2). In the 
6 contigs, 24 genomic insertions were detected which 
disrupted the collinearity (Figure 2), one of which was 
also a tandem repeat, At4g20140 in contig F. In five in- 
stances genes that were duplicated or triplicated in A. 
thaliana were present in only one copy in the B. nigra 
genome, apart from At5g47350 in B genome contig C 
where the homologous gene family was represented by 
two copies (Figure 2). The most significant micro- 
rearrangement that was specific to the B genome was 
observed in contig B where the region containing 
At5g47480 and At5g47490 was duplicated and inverted 
(Figures 2 and 3). Most of these rearrangements were 
specific to one contig, and one genome, apart from the 
triplicated gene family of At4gl7470, At4gl7480 and 
At4gl7483, which was represented by one gene in all 
three genomes [32,34], Contig E in B. nigra showed the 
highest relative incidence of gene insertion or transposition 
with concomitant absence of conserved gene content, with 
one region, between At4gl7380 and At4gl7480, being 
disrupted by the presence of six genes from unrelated re- 
gions of the A thaliana genome. The same region showed 
a similar level of discontinuity in the C genome but the A 
genome although missing conserved genes showed little 
evidence of gene insertion. The analysis at the sequence 
level demonstrated that unlike the A and C genomes 
where an inversion, as observed in the B. rapa genome 
sequence [45] has separated contigs F and G, the B genome 
has maintained collinearity with A. thaliana chromosome 
4. The set of genes from A. thaliana chromosome 2 at the 
top of contigs A and B of B. nigra indicate a rearrangement 



end point. This discontinuity in the collinearity was previ- 
ously observed in B. rapa and indicates a break in the an- 
cestral conserved block structure [46]. 

Comparative alignment of homoeologous segments in 
Brassica diploid genomes 

The overall similarity of each of the homoeologous re- 
gions of the three diploid genomes, A, B and C was 
compared at the nucleotide level using MUMmer as 
shown in Figure 3 [47]. In general as expected from the 
observed gene content the six regions show good con- 
servation between the three genomes. However, the 
sequence alignments uncover instances of genome dupli- 
cations, deletions /expansions and inversions that are 
specific to each of the three genomes. Contigs E and F 
show the most distinct differences. Contig E contains a 
large number of genes from non-collinear regions of the 
genome and in addition shows further expansion 
through repeat element proliferation, interestingly the 
pattern of genome expansion is conserved between the 
B and C genomes; however, the types of genetic ele- 
ments represented is not (Figure 2). Contig F has seen 
expansion in the C genome through tandem duplication 
of members of a gene family of proline rich genes. The 
C genome has more observed instances of genome ex- 
pansion compared to either the A or B genomes, which 
is particularly notable in contig B (Figure 3 and Table 3). 
The remarkable expansion of this region in the C gen- 
ome is due to the insertion of a large non-collinear seg- 
ment (-200 kb) between At5g47070 and At5g47100. 

Level of gene conservation and fractionation in triplicated 
regions 

As with previous comparative analysis conservation was 
inferred from the presence of significant sequence simi- 
larity between genie regions in the three genomes. How- 
ever, this does not extend to the maintenance of entire 
gene structures and fractionated exon loss is a common 
feature of plant genome evolution [48]. The retention of 
exonic regions for the conserved A. thaliana genes was 
calculated for each Brassica genome contig (Table 3 and 
Additional file 1: Table S3). The results as shown in 
Table 3 indicate that for five out of the six regions the B 
genome has maintained a higher number of complete 
gene copies. For those B genome contigs homologous to 
A. thaliana chromosome 4, two of the regions have a 
lower level of fractionation compared to the third (74% 
retention of exonic regions in contigs F and D, 38% in 
contig E). While the A and C genomes show a similar 
trend, the expected stronger conservation of one of the 
three duplicated regions is more apparent. The two trip- 
licated regions of the B genome studied follow the pat- 
tern of genome maintenance as observed previously for 
the A genome where one region, the least fractionated' 
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Figure 2 Alignment of conserved genes between homologous regions of three Bmssica genomes and chromosomes 4 and 5 of A thaliana. 
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Figure 3 Alignment of B. nigra contigs with their homologous segments in the A and C genomes as found by MUMmer. 
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Table 3 Level of fractionation in the triplicated regions of the three Brassica genomes 



Size (bp) Number of Number of Number of retained 

potential TEs retained exons (%) genie regions (%) 1 



Number of predicted 
genes (%) 2 



Number of ab Gene density 
initio genes (gene/bp) 



ContigA 

Arabidopsis 
A Genome 
B Genome 
C Genome 

ContigB 

Arabidopsis 
A Genome 
B Genome 
C Genome 

ContigC 

Arabidopsis 
A Genome 
B Genome 
C Genome 

ContigD 

Arabidopsis 
A Genome 
B Genome 
C Genome 

ContigE 

Arabidopsis 
A Genome 
B Genome 
C Genome 

ContigF 

Arabidopsis 
A Genome 
B Genome 
C Genome 



221029 
311929 
258940 
586573 



341067 
365330 
258875 
828528 



182585 
154252 
175460 
398219 



327453 
278470 
386079 
362518 



112818 
132246 
159548 
234259 



387646 
510517 
450974 
543877 



3 
5 

32 



14 
1 

42 



0 

2 

14 



7 

10 



12 



37 



388 

190 (48.97) 
207 (53.35) 
184 (47.42) 



496 

270 (54.44) 
288 (58.06) 
287 (57.86) 



243 

174 (71.60) 
163 (67.08) 
181 (74.49) 



456 

243 (53.29) 
336 (73.68) 
240 (52.63) 



222 

92 (41.44) 
85 (38.29) 
82 (36.94) 



503 

366 (72.76) 
371 (73.76) 
363 (72.17) 



67 



46 (68.66) 
44 (65.67) 
43 (64.18) 



55 (61.80) 
62 (69.66) 
55 (61.80) 



42 



34 (80.95) 
36 (85.71) 
34 (80.95) 



64 (72.73) 
73 (82.95) 
64 (72.73) 



34 



1 7 (50.00) 

15 (44.12) 

16 (47.06) 



80 (83.33) 
84 (87.50) 
86 (89.58) 



96 



29 (43.28) 
35 (52.24) 
28 (41.79) 



35 genes in all 3 



41 (46.07) 
46 (51.69) 
40 (44.94) 



47 genes in all 3 



25 (59.52) 
31 (73.81) 
29 (69.05) 



31 genes in all 3 



42 (47.73) 
48 (54.55) 
45 (51.14) 



62 genes in all 3 



14 (41.18) 
14 (41.18) 
13 (38.24) 



12 genes in all 3 



65 (67.71) 
70 (72.92) 
62 (64.58) 



76 genes in all 3 



72 
71 
159 



82 
64 
189 



32 
38 
102 



66 
115 

99 



38 
42 
70 



121 
113 
159 



4332.35 
3647.04 
3689.14 



4455.24 
4044.92 
4383.75 



4820.38 
4617.37 
3904.1 1 



4219.24 
3357.21 
3661.80 



3480.16 
3798.76 
3346.56 



4219.15 
3990.92 
3420.61 



1. Indicates the presence 
homology to complete A 



of complete and partial genes as determined by significant homology to any region of the annotated genes in A. thaliana. 2. Indicates 
thaliana genes based on the maintenance of intron/exon structure. 



(LF) is more highly conserved than either of the other 
two, more fractionated V (MF1) and MF2 [45]. However, 
the two MF regions of the B genome appear to have in- 
curred reduced deletion events compared to the A and 
C genomes as shown by a higher prevalence of gene re- 
tention (Table 3 and Additional file 1: Table S3). It has 



been suggested previously that higher levels of fraction- 
ation can be the result of on-going transposable element 
activity [49]; however, although in general the C genome 
shows both an expansion in genome size and repetitive 
element composition, this is not true for the A genome 
(Table 3). 
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Timing of genome divergence 

Synonymous base substitution rates (Ks values) were cal- 
culated for variable numbers of conserved genes across 
each genomic region of the three Brassica genomes and 
A. thaliana (Table 4). The mean values of Ks and diver- 
gence times are in agreement with values reported previ- 
ously for the A and C genomes [25,33]. The divergence 
time of the ancestral Brassica genome from A. thaliana 
was estimated between -13.9 Mya in contig A to -17 
Mya in contig D, which is similar to previous estimates 
of 14.5 - 20 Mya [24]. Contigs homologous to A. 
thaliana chromosome 4 in all three Brassica genomes 
were suggested to be significantly older than those hom- 
ologous to chromosome 5 (p < 0.0001), which could sug- 
gest that as in many plant species the whole genome 
duplication event in A. thaliana was as a result of allo- 
polyploidy. Previous estimates suggest B. rapa and B. 
oleracea diverged at around 3.7 Mya [26] and our calcu- 
lations are in accordance with an average value of 3.2 
Mya. The divergence time of B. nigra from the B. rapa/ 
oleracea lineage was estimated previously based on lim- 
ited available sequence data to be 7.9 Mya [25]. As de- 
tailed in Table 3 the calculated divergence times for B. 
nigra from B. rapa/oleracea for the target regions varied 
with an average of 6.2 Mya, although lower values were 
observed for contig E that could result from a smaller 
number of genes being available for comparison. It was 
assumed that the whole genome triplication (WGT) 



event observed in the B genome was shared with the A 
and C genomes and calculation of Ks values for con- 
served genes across genomic segments within each of 
the Brassica genomes corroborated this assumption, 
with no significant difference between the average age of 
divergence of the WGT (11.6 ± 3.4 Mya). 

Discussion 

In the 1950s when B. juncea replaced B. nigra as the 
mustard crop of choice in Asia, B. nigra was effectively 
abandoned with regards to crop improvement [50] and 
has been the subject of limited breeding, genetics and 
genomics research compared to the other two diploid 
species of Us triangle [51]. However, the Brassica B gen- 
ome has been recognised as a useful source of novel alleles 
for various traits of interest, in particular disease resistance 
and tolerance to abiotic stress, and numerous attempts 
have been made to transfer these traits to the Brassica A 
and C genomes, with varying success [9,13,14,17,52-54]. 
An improved understanding of the genome structure and 
gene composition of the Brassica B genome could provide 
insights into its relationship among the Brassica species 
and could potentially facilitate exploitation of this import- 
ant resource. The current study benefited from previous 
research that had studied extensively the organisation of 
the Brassica A and C genome regions homologous to a 
222 kb region of A. thaliana chromosome four, which was 
itself duplicated on chromosome five [44]. 



Table 4 Pair wise divergence time of genome segments based on synonymous base substitution rates 



Region 


Number of genes 


A genome 




C Genome 






Arabidopsis 




Average Ks 


Mya 


Average Ks 


Mya 




Average Ks 


Mya 


B genome Contig A 


15 genes 


0.1 8 ±0.07 


6.073 ±2.1 8 


0.1 9 ±0.06 


6.60 ± 


2.04 


0.42 ±0.10 


14.30 ±3.28 


A genome Contig A 








0.09 ± 0.05 


3.22 ± 


1.68 


0.41 ±0.09 


13.68 ±2.91 


C genome Contig A 














0.41 ±0.09 


13.96 ±3.1 6 


B genome Contig B 


25 genes 


0.20 ± 0.07 


6.92 ± 2.25 


0.22 ± 0.06 


7.36 ± 


2.10 


0.42 ± 0.09 


14.20 ±2.99 


A genome Contig B 








0.09 ± 0.05 


3.20 ± 


1.65 


0.43 ±0.10 


14.44 ±3.24 


C genome Contig B 














0.45 ±0.10 


15.04 ±3.31 


B genome Contig C 


16 genes 


0.1 8 ±0.06 


6.30 ±2.01 


0.1 8 ±0.07 


6.06 ± 


2.41 


0.45 ±0.1 3 


15.07 ±4.45 


A genome Contig C 








0.10 ±0.06 


3.47 ± 


1.85 


0.43 ±0.10 


14.54 ±3.41 


C genome Contig C 














0.43 ±0.11 


14.39 ±3.61 


B genome Contig D 


28genes 


0.20 ±0.11 


6.81 ±3.67 


0.1 9 ±0.09 


6.34 ± 


2.96 


0.50 ±0.14 


16.69 ±4.69 


A genome Contig D 








0.10 ±0.06 


3.59 ± 


1.85 


0.52 ±0.18 


17.40 ±5.86 


C genome Contig D 














0.50 ±0.1 8 


16.99 ±5.99 


B genome Contig E 


7 genes 


0.1 5 ±0.04 


5.17 ±1.46 


0.1 4 ±0.04 


4.91 ± 


1.38 


0.48 ±0.1 9 


16.03 ±6.44 


A genome Contig E 








0.08 ± 0.05 


2.97 ± 


1.51 


0.46 ±0.1 8 


15.47 ±6.1 2 


C genome Contig E 














0.46 ±0.1 8 


15.59 ±6.1 2 


B genome Contig F 


41 genes 


0.1 9 ±0.05 


6.46 ±1.91 


0.1 8 ±0.06 


6.32 ± 


1.98 


0.47 ±0.1 4 


15.93 ±4.70 


A genome Contig F 








0.08 ± 0.03 


2.76 ± 


1.11 


0.48 ±0.1 4 


16.10 ±4.64 


C genome Contig F 














0.47 ±0.1 4 


15.83 ±4.65 
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The Brassica B genome was separated previously from 
the A/C genome lineage in phylogenetic analyses 
[23,55]. Analysis of the six regions within the Brassica B 
genome compared to both the A and C genomes 
allowed a more accurate estimate of the timing for diver- 
gence of the two lineages, ~ 6.2 Mya (± 2.19), compared 
to those previously published which ranged widely from 
7.9-14.6 Mya [25] to 5-10 Mya [56]. The pattern of 
genome rearrangements and gene deletions that differ- 
entiate the Brassica genomes over this period have led 
to the extant species. The Brassica B genome as for the 
other diploid Brassica genomes, retained three genomic 
copies of each A. thaliana region, reflecting the under- 
lying whole genome triplication (WGT) event that dis- 
tinguishes the Brassiceae tribe [25]. The same rate of 
synonymous substitution was observed between the trip- 
licated copies of the three Brassica genomes and A. 
thaliana (Table 4) and similarly the intragenomic rate, 
(A: Ks =0.323 ± 0.12, B: Ks =0.353 ± 0.08, C: Ks =0.353 ± 
0.11) across the triplicated regions within the genomes 
was relatively equivalent, suggesting the B genome 
evolved from the same paleohexaploidy event as the A 
and C genome. As reported previously in B. oleracea, B. 
rapa and B. napus [32,34] and shown in Figure 2 there 
is a high level of conservation for the composition and 
order of genes among the three Brassica diploid genomes 
compared to A. thaliana. Observed breaks in collinearity 
were due to predominantly minor rearrangements, such 
as inversions, duplications, small insertions and deletions, 
although one large insertion (> 200 kb) of a non-collinear 
segment in the C genome was found. One major chromo- 
somal rearrangement, a large inversion relative to A. 
thaliana chromosome four, differentiated the A/C lineage 
from the Brassica B genome for one of the triplicated cop- 
ies, which was also apparent from the genome sequence 
of B. rapa [45]. Genetic mapping of the B genome in B. 
juncea has certainly indicated that a number of large-scale 
chromosomal rearrangements may differentiate the Bras- 
sica A and B genomes [22], although this study also indi- 
cated that at least three of the B genome chromosomes 
were virtually collinear with their A genome homoeologue 
[22]. In addition, they identified conservation of ancestral 
Brassicaceae block structures across the three diploid 
Brassica genomes. Interestingly, as indicated in Figure 2, 
endpoints delineating ancestral Brassicaceae blocks, in this 
case between A. thaliana chromosomes two (block K) and 
five (block V), were found to be conserved between at 
least two of the triplicated copies in both the Brassica A 
(on linkage groups A02 - contig L, and A09 - contig B) 
and B genomes [45,46]. The identification of such con- 
served ancestral rearrangements will further assist in de- 
fining the structure of the progenitor genome of the 
Brassiceae tribe, which is proposed to differ from that of 
A. thaliana [57]. 



Although the regions studied demonstrated marked 
conservation relative to A. thaliana, each species and 
each genomic region were characterized by the presence 
of gene sequences from non-collinear regions, which 
were rarely shared. However, there appear to be regions 
that are more sensitive to such alterations and although 
they are common to all genomes in lacking collinearity 
(for example in contig E between At4gl7480 and 
At4gl7390) the changes observed were specific to each. 
In addition, each of the regions across the genomes var- 
ied in size (Table 3). The B. nigra B genome has been 
reported to be 632 Mb (0.65 pg), smaller than B. 
oleracea at 696 Mb (0.71 pg) but larger than B. rapa at 
529 Mb (0.54 pg), (Johnston et al, 2004). It had previ- 
ously been shown that the C genome was expanded 
compared to the A genome for the targeted regions, 
which would be in accordance with current genome esti- 
mates; however, our estimates for this region place the B 
genome closer in size to that of B. rapa. 

There has been considerable interest in attempting to 
transfer genomic regions from the B genome into the A 
and C genomes of B. napus [4,13,14,58]; however, this 
has been largely unproductive due to limited or no re- 
combination occurring between introduced B genome 
chromosomes and their supposed orthologues. Even the 
use of artificial resynthesis to develop new allopoly- 
ploids, which greatly enhances the level of recombin- 
ation between the Brassica A and C genomes [11], has 
been ineffectual [59]. Yet at the genetic level [22] and as 
shown here at the sequence level there is marked con- 
servation across the three Brassica genomes, in fact the 
level of karyotype and microstructure divergence across 
the three genomes is comparable. The question is why is 
aberrant homoeologous recombination so frequently ob- 
served in newly resynthesized B. napus (approximately 
10% of the total observed recombination) but absent in 
equivalent resynthesized B. juncea [11,59]. Although it is 
possible that there is a genetic barrier to recombination 
such as that found in wheat [60] or proposed for B. 
napus [18] the inability to breakdown this control would 
suggest limited or no variation exists for the trait, or un- 
like the progenitors of wheat the B genome diploid 
maintains activity for a trait which has value only in the 
polyploid state. If the barrier is then physical, then 
chromosomal rearrangements not captured in previous 
genetic mapping studies presumably differentiate the ge- 
nomes, one such example would be the inversion found 
here that is specific to the A/C lineage. Inversions have 
been proposed to have a significant impact on plant evo- 
lution [61] and could have lead to the reproductive isola- 
tion and speciation of the B genome. 

Although the current analyses focuses on two tripli- 
cated regions of the Brassica B genome, it provides in- 
sights into the gene retention pattern across the wider 
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genome, since the A genome regions studied here reflect 
the pattern found in the B. rapa genome sequence [45]. 
The publication of the first Brassica genome sequence, 
that of B. rapa, provided extensive evidence of gene frac- 
tionation following the WGT and perhaps most interest- 
ingly due to bias in gene retention in the three genomic 
copies led to the two-step polyploidy hypothesis with 
fractionation between steps [45]. This hypothesis, in which 
the paleohexaploid was formed from the hybridisation be- 
tween a diploid and an established tetraploid, has been 
rigorously tested since, with the retention of exonic re- 
gions being studied in detail for the A genome [62] and in- 
dicating that deletions are the major mode of gene 
inactivation and hence evolutionary drive in Brassica spe- 
cies. For the studied regions, the B genome appears to fol- 
low the same gene fractionation pattern as the A genome 
with one of the three genomic copies showing a higher 
level of conservation (Figure 4; Additional file 1: Table S3). 
The observed level of gene loss or fractionation for the 
triplicated regions at each important branch point in the 
evolution of the studied genomes is indicated in Figure 4. 
The highest percentage of fractionation appears to have 
occurred prior to the separation of the three genomes that 
is where gene loss is shared across all three genomes. Al- 
though the general pattern of fractionation holds true 
species-specific differences can be created through the 



15MYA 



types of genes that are maintained. For example in the re- 
gions homologous to A. thaliana chromosome five, no cop- 
ies of two annotated disease resistance genes, At5g47250 
and At5g47260, are found in the A and C genomes, yet 
multiple full or partial copies are maintained for each in the 
three homologous B genome regions. In addition, local 
insertion of a further annotated disease resistance gene 
(Atlg59620) in this region of the B genome for contig A 
indicates a resistance cluster specific to the B genome. 

Conclusions 

Analysis of homologous regions of three closely related 
Brassica diploid genomes has identified extensive con- 
servation of gene content and order, which is not 
reflected in the established phylogenetic relationship of 
the three species. The genome of B. nigra, based on its 
inability to establish effective pairing structures with the 
A and C genomes, might have been expected to have 
undergone more rearrangements at the micro-level, 
suggesting perhaps that major chromosomal events such 
as the observed inversion could have played a significant 
role in the speciation of the B genome. The B genome is 
likely to be an excellent source of novel genes for a 
number of important traits and capturing this diversity 
through genome sequencing could prove important for 
future crop improvement. 



6.2 MYA 3.2 MYA 



LF-MF1 -MF2 
At5 13.6-18.0-24.0 
At4 7.3- 16.0-47.0 



B Genome 



LF-MF1 -MF2 
At5 6.8-7.5-11.0 
At4 6.5-10.0-0.0 



A Genome 



LF - MF1 - MF2 
At5 4.5-6.0-5.6 
At4 3.1 - 1.1 -8.8 



LF - MF1 - MF2 
At5 2.3-3.0-2.2 
At4 2.1 - 1.1 -2.9 



LF - MF1 - MF2 
C Genome At5 2.2 - 3.0 - 3.4 
At4 0.0-1.1 -5.9 



Arabidopsis 



Figure 4 Phylogenetic tree of the Brassica B and A/C lineage. The percentage of fractionation for the three sub-genomes at each node is 
specified. The contigs are ordered according to the overall level of fractionation from least (LF) to most fractionated (MF2) (Table 3). 
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Methods 

Construction of BAC library 

Seedlings of a doubled haploid B. nigra (DH No 100, 
provided by Alison Ferrie, National Research Council of 
Canada Plant Biotechnology Institute (NRC-PBI), Saska- 
toon) were grown in a controlled environment cabinet at a 
constant 20°C with 16 hour photoperiod until 6-8 leaf 
stage. Plants were placed in the dark for 3 days prior to tis- 
sue harvest, leaf tissue was flash frozen in liquid nitrogen 
and shipped to BioS&T, Montreal, Canada (http://www 
biost.com/) for library construction. Large insert genomic 
DNA was ligated into the HindlYL site of the plndigoBAC- 
5 vector (Epicentre, Madison, WI, US). The library clones 
are prefixed "BGH" and are arrayed in 222 384 well plates, 
with an average insert size of 130 kb, representing almost 
a 20 X coverage of the Brassica B genome which has an 
estimated size of 632 Mb based on flow cytometry [40] . 

Primary BAC library screening 

The library was gridded in replicate on five 22.2 cm 2 
Hybond™-XL (GE Healthcare Uppsala, Sweden) mem- 
branes and screened by colony hybridisation using probes 
of seventeen genes from the A thaliana 222 kb target re- 
gion of chromosome 4 [29] to identify homologous clones 
(Table 1). Preparation of the A. thaliana gene specific 
probes and their sequence are detailed in O'Neill and 
Bancroft (2000). These probes were labeled by the random 
priming method using the RediPrime II kit (GE 
Healthcare Uppsala, Sweden) according to the manufac- 
turers instructions. Hybridisation was carried out at 65°C 
for 16 h in QuikHyb Hybridisation Buffer (Agilent Tech- 
nologies Santa Clara, CA) according to the manufacturer s 
instructions. The membranes were washed twice for 
10 min at 65°C followed by three washes for 10 min at 
room temperature in 2 x SSC and 0.1% SDS. 

BAC DNA preparation 

DNA from BAC clones identified in the primary screen 
was prepared in one of two ways: for DNA digestion, 
HICF and BAC end sequencing, BAC DNA was pre- 
pared using standard alkaline lysis [63]. For full length 
sequencing, DNA was isolated using the Qiagen Large 
Construct Kit (Qiagen, Valencia, CA) according to the 
manufacturers instructions. The integrity and size of the 
clones was confirmed by digestion with Notl and reso- 
lution by Pulse Field Gel Electrophoresis (PFGE) on a 
1% agarose gel in 0.5 x TBE at 120° angle for 16 hours 
at 14°C with a 0.1 - 40.0 second switch. The sizes of the 
fragments were estimated using the Lambda Ladder 
PFG and MidRange II PFG markers (NEB Ipswich, MA). 

Southern blot analysis 

Purified BAC DNA was digested with Hindlll enzyme 
and separated on 1% agarose in 1 x TAE at 5V for 



18 hours. The digested DNA was blotted onto Hybond 
XL membranes and probed as above. Hybridisation was 
done at 65°C for 16 h in modified Church buffer [64] 
and the membranes were washed twice for 10 min at 
room temperature in 2 x SSC and 0.1% SDS. 

BAC sequence analyses 

BAC end sequencing was completed for 1,110 positive 
clones using modified BigDye Terminator v.3.1 and 
BigDye Xterminator purification kit protocols (Life 
Technologies Carlsbad, CA). Sequencing reactions were 
run on an AB 3730x1 at the NRC DNA Technologies La- 
boratory in Saskatoon. The resultant sequence data was 
compared to itself and to A. thaliana using BLASTN 
with default parameters and an E value cut-off of 1 x E- 
10. Full length BAC sequences were generated with 
Roche 454 Flex sequencing also at NRC. The 15 BACs 
were each indexed and sequenced in one half of a 454 
plate. The sequences were assembled into large contigs 
using Newbler v 2.6 (Roche Diagnostics). 

SNaPshot fingerprinting 

The fingerprinting reaction was performed following 
Luo et al, (2003) with minor modifications recommended 
by the Arizona Genomics Institute. The BAC DNA was 
incubated with the digestion and labeling mix for one 
hour at 37°C and labelling was performed for one hour at 
65°C. The size standard geneScan LIZ- 1200 (size range 
from 20 to 1200 bp) was added to each sample prior to 
loading on an AB 3100 for capillary electrophoresis using 
the DS-02 dye set. Peak height, area and sizes were col- 
lected by GeneMapper® software v3.7 and converted into 
FPC readable format after editing by the GenoProfiler® 
v2.1 (http://wheat.pw.usda.gov/PhysicalMapping/; [41]). 

Data editing operations and FPC contig assembly 

The editing process included removal of vector bands 
and removal of clones for which the reaction failed, 
lacked an insert or were suggestive of cross contamin- 
ation [65]. The files generated by GenoProfiler were 
transferred to FPC v9.3 (www.agcol.arizona.edu/soft- 
ware/fpc; [42]). Initially, a tolerance value of 3.0 and a 
Sulston cut off score of 1 x 10" 35 was used to assemble 
contigs automatically, the stringency was later reduced 
as described in the Results section. "DQer" function of 
FPC was used to reassemble contigs with more than 10 
Q clones. The resulting contigs were merged by the 
"End to End" auto merge function with a minimum of 
two matching ends. The remaining singletons were 
merged to form contigs by the "Auto merge /add" func- 
tion. The fingerprints of clones in the merged contigs 
were then analysed manually and compared to contigs 
assembled based on Southern blot analyses to finalize 
the contigs [31]. 
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Sequence analyses 

The genome assembly of B. rapa (http://brassicadb.org) 
and preliminary assembly of B. oleracea (TO 1000 se- 
quencing consortium, unpublished data) were used to 
identify extended A and C genomic regions equivalent 
to the sequenced B genome regions. The A. thaliana 
genome sequence (TAIR-10; www.arabidopsis.org) was 
compared to the Brassica sequences using BLASTN 
with default parameters and an Expect value of 1 x E" 10 . 
GlimmerHMM [66] was used to predict genes using 
training data from A. thaliana. MUMer and NUCmer 
were used to align contigs of the AC genome with those 
of the B genome and the alignments were visualised 
using MUMmer plot [47]. 

Calculation of Ks values and phylogenetlc tree 

Homologous predicted cDNA sequences from the Bras- 
sica genomes and A. thaliana were aligned using 
ClustalW version 1.83 [67]. Ks and Ka values were calcu- 
lated in a pair wise model using the CODEML program 
in the PAML package version 4.4 [68]. The commonly 
adopted estimate of mutation rate of 1.5 x 10" 8 synonym- 
ous substitutions per site per year [69] was used to 
estimate the divergence time of the three genomes. 

The alignments of the four orthologous gene sets were 
trimmed for regions with gaps and missing data using 
trimAl [70]. The alignment of 126 gene sets were 
concatenated using the Phyutility software [71], building 
a total alignment length of 120,936 bps which was then 
used for the phylogenetic analysis performed by the 
FastTree2.1.3 [72] and processed using Dendroscope 
[73]. Robustness of phylogenetic inference was assessed 
by the bootstrap analysis approach for 1000 reassembled 
alignments. 

Additional file 



Additional file 1: Table SI. Overlapping BACs based on Southern 
Hybridization analysis. +: positive hybridization with the gene specific 
probe, -: no hybridization. Highlighted cells are the fully sequenced BACs. 
Table S2. Overlapping BACs based on the FPC analysis. Table S3. 
Comparison of the level of exon retention across the three Brassica 
genomes. 
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